14 research outputs found

    Deep filter banks for texture recognition, description, and segmentation

    Get PDF
    Visual textures have played a key role in image understanding because they convey important semantics of images, and because texture representations that pool local image descriptors in an orderless manner have had a tremendous impact in diverse applications. In this paper we make several contributions to texture understanding. First, instead of focusing on texture instance and material category recognition, we propose a human-interpretable vocabulary of texture attributes to describe common texture patterns, complemented by a new describable texture dataset for benchmarking. Second, we look at the problem of recognizing materials and texture attributes in realistic imaging conditions, including when textures appear in clutter, developing corresponding benchmarks on top of the recently proposed OpenSurfaces dataset. Third, we revisit classic texture representations, including bag-of-visual-words and the Fisher vectors, in the context of deep learning and show that these have excellent efficiency and generalization properties if the convolutional layers of a deep model are used as filter banks. We obtain in this manner state-of-the-art performance in numerous datasets well beyond textures, an efficient method to apply deep features to image regions, as well as benefit in transferring features from one domain to another.Comment: 29 pages; 13 figures; 8 table

    Describing Textures in the Wild

    Get PDF
    Patterns and textures are defining characteristics of many natural objects: a shirt can be striped, the wings of a butterfly can be veined, and the skin of an animal can be scaly. Aiming at supporting this analytical dimension in image understanding, we address the challenging problem of describing textures with semantic attributes. We identify a rich vocabulary of forty-seven texture terms and use them to describe a large dataset of patterns collected in the wild.The resulting Describable Textures Dataset (DTD) is the basis to seek for the best texture representation for recognizing describable texture attributes in images. We port from object recognition to texture recognition the Improved Fisher Vector (IFV) and show that, surprisingly, it outperforms specialized texture descriptors not only on our problem, but also in established material recognition datasets. We also show that the describable attributes are excellent texture descriptors, transferring between datasets and tasks; in particular, combined with IFV, they significantly outperform the state-of-the-art by more than 8 percent on both FMD and KTHTIPS-2b benchmarks. We also demonstrate that they produce intuitive descriptions of materials and Internet images.Comment: 13 pages; 12 figures Fixed misplaced affiliatio

    Neighbourhood Consensus Networks

    Get PDF
    We address the problem of finding reliable dense correspondences between a pair of images. This is a challenging task due to strong appearance differences between the corresponding scene elements and ambiguities generated by repetitive patterns. The contributions of this work are threefold. First, inspired by the classic idea of disambiguating feature matches using semi-local constraints, we develop an end-to-end trainable convolutional neural network architecture that identifies sets of spatially consistent matches by analyzing neighbourhood consensus patterns in the 4D space of all possible correspondences between a pair of images without the need for a global geometric model. Second, we demonstrate that the model can be trained effectively from weak supervision in the form of matching and non-matching image pairs without the need for costly manual annotation of point to point correspondences. Third, we show the proposed neighbourhood consensus network can be applied to a range of matching tasks including both category- and instance-level matching, obtaining the state-of-the-art results on the PF Pascal dataset and the InLoc indoor visual localization benchmark.Comment: In Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018

    InLoc: Indoor Visual Localization with Dense Matching and View Synthesis

    Get PDF
    We seek to predict the 6 degree-of-freedom (6DoF) pose of a query photograph with respect to a large indoor 3D map. The contributions of this work are three-fold. First, we develop a new large-scale visual localization method targeted for indoor environments. The method proceeds along three steps: (i) efficient retrieval of candidate poses that ensures scalability to large-scale environments, (ii) pose estimation using dense matching rather than local features to deal with textureless indoor scenes, and (iii) pose verification by virtual view synthesis to cope with significant changes in viewpoint, scene layout, and occluders. Second, we collect a new dataset with reference 6DoF poses for large-scale indoor localization. Query photographs are captured by mobile phones at a different time than the reference 3D map, thus presenting a realistic indoor localization scenario. Third, we demonstrate that our method significantly outperforms current state-of-the-art indoor localization approaches on this new challenging data

    Neighbourhood Consensus Networks

    Get PDF
    International audienceWe address the problem of finding reliable dense correspondences between a pair of images. This is a challenging task due to strong appearance differences between the corresponding scene elements and ambiguities generated by repetitive patterns. The contributions of this work are threefold. First, inspired by the classic idea of disambiguating feature matches using semi-local constraints, we develop an end-to-end trainable convolutional neural network architecture that identifies sets of spatially consistent matches by analyzing neighbourhood consensus patterns in the 4D space of all possible correspondences between a pair of images without the need for a global geometric model. Second, we demonstrate that the model can be trained effectively from weak supervision in the form of matching and non-matching image pairs without the need for costly manual annotation of point to point correspondences. Third, we show the proposed neighbourhood consensus network can be applied to a range of matching tasks including both category- and instance-level matching, obtaining the state-of-the-art results on the PF Pascal dataset and the InLoc indoor visual localization benchmark

    NCNet: Neighbourhood Consensus Networks for Estimating Image Correspondences

    Get PDF
    International audienceWe address the problem of finding reliable dense correspondences between a pair of images. This is a challenging task due to strong appearance differences between the corresponding scene elements and ambiguities generated by repetitive patterns. The contributions of this work are threefold. First, inspired by the classic idea of disambiguating feature matches using semi-local constraints, we develop an end-to-end trainable convolutional neural network architecture that identifies sets of spatially consistent matches by analyzing neighbourhood consensus patterns in the 4D space of all possible correspondences between a pair of images without the need for a global geometric model. Second, we demonstrate that the model can be trained effectively from weak supervision in the form of matching and non-matching image pairs without the need for costly manual annotation of point to point correspondences. Third, we show the proposed neighbourhood consensus network can be applied to a range of matching tasks including both category- and instance-level matching, obtaining the state-of-the-art results on the PF, TSS, InLoc and HPatches benchmarks

    InLoc: Indoor Visual Localization with Dense Matching and View Synthesis

    Get PDF
    International audienceWe seek to predict the 6 degree-of-freedom (6DoF) pose of a query photograph with respect to a large indoor 3D map. The contributions of this work are three-fold. First, we develop a new large-scale visual localization method targeted for indoor environments. The method proceeds along three steps: (i) efficient retrieval of candidate poses that ensures scalability to large-scale environments, (ii) pose estimation using dense matching rather than local features to deal with textureless indoor scenes, and (iii) pose verification by virtual view synthesis to cope with significant changes in viewpoint, scene layout, and occluders. Second, we collect a new dataset with reference 6DoF poses for large-scale indoor localization. Query photographs are captured by mobile phones at a different time than the reference 3D map, thus presenting a realistic indoor localization scenario. Third, we demonstrate that our method significantly outperforms current state-of-the-art indoor localization approaches on this new challenging data

    Recognizing describable attributes of textures and materials in the wild and clutter

    No full text
    Visual textures play an important role in image understanding because theyare a key component of the semantic of many images. Furthermore, texture representations, which pool local image descriptors in an orderless manner, have hada tremendous impact in a wide range of computer vision problems, from texture recognition to object detection. In this thesis we make several contributions to the area of texture understanding. First, we add a new semantic dimension to texture recognition. Instead of focusing on instance or material recognition, we propose a human-interpretable vocabulary of texture attributes, inspired from studies in Cognitive Science, to describe common texture patterns. We also develop a corresponding dataset, the Describable Texture Dataset (DTD), for benchmarking. We show that these texture attributes produce intuitive descriptions of textures. We also show that they can be used to extract a very low dimensional representation of any texture that is very effective in other texture analysis tasks, including improving the state-of-the art in material recognition on the most challenging datasets available today. Second, we look at the problem of recognizing texture attributes and materials in realistic uncontrolled imaging conditions, including when textures appear in clutter. We build on top of the recently proposed Open Surfaces dataset, introduced by the graphics community, by deriving a corresponding benchmarks for material recognition. In addition to material labels, we also augment a subset of Open Surfaces with semantic attributes. Third, we propose a novel texture representation, combining the recent advances in deep-learning with the power of Fisher Vector pooling. We provide thorough evaluation of the new representation, and revisit in general classic texture representations, including bag-of-visual-words, VLAD and the Fisher Vectors, in the context of deep learning. We show that these pooling mechanisms have excellent efficiency and generalisation properties if the convolutional layers of a deep model are used as local features. We obtain in this manner state-of-the-art performance in numerous datasets, both in texture recognition and image understanding in general. We show through our experiments that the proposed representation is an efficient way to apply deep features to image regions, and that it is an effective manner of transferring deep features from one domain to another.</p

    Recognizing describable attributes of textures and materials in the wild and clutter

    No full text
    Visual textures play an important role in image understanding because theyare a key component of the semantic of many images. Furthermore, texture representations, which pool local image descriptors in an orderless manner, have hada tremendous impact in a wide range of computer vision problems, from texture recognition to object detection. In this thesis we make several contributions to the area of texture understanding. First, we add a new semantic dimension to texture recognition. Instead of focusing on instance or material recognition, we propose a human-interpretable vocabulary of texture attributes, inspired from studies in Cognitive Science, to describe common texture patterns. We also develop a corresponding dataset, the Describable Texture Dataset (DTD), for benchmarking. We show that these texture attributes produce intuitive descriptions of textures. We also show that they can be used to extract a very low dimensional representation of any texture that is very effective in other texture analysis tasks, including improving the state-of-the art in material recognition on the most challenging datasets available today. Second, we look at the problem of recognizing texture attributes and materials in realistic uncontrolled imaging conditions, including when textures appear in clutter. We build on top of the recently proposed Open Surfaces dataset, introduced by the graphics community, by deriving a corresponding benchmarks for material recognition. In addition to material labels, we also augment a subset of Open Surfaces with semantic attributes. Third, we propose a novel texture representation, combining the recent advances in deep-learning with the power of Fisher Vector pooling. We provide thorough evaluation of the new representation, and revisit in general classic texture representations, including bag-of-visual-words, VLAD and the Fisher Vectors, in the context of deep learning. We show that these pooling mechanisms have excellent efficiency and generalisation properties if the convolutional layers of a deep model are used as local features. We obtain in this manner state-of-the-art performance in numerous datasets, both in texture recognition and image understanding in general. We show through our experiments that the proposed representation is an efficient way to apply deep features to image regions, and that it is an effective manner of transferring deep features from one domain to another.This thesis is not currently available via ORA
    corecore